Outline

Intro Foul Ball Leaders Batters Pitchers Analysis 0/1 Strike vs. 2 Strike 0/1 Strike vs. K% 2 Strike vs. K% Team Foul % Scatterplot See if anyone like Acuna’s K% dropped/rose due to foul ball change Hardest hit foul balls? Dive into any players with interesting numbers like Jazz, Yordan, etc.

# Setup
library(baseballr)
library(DBI)
library(bigrquery)
library(dplyr)
library(scales)
library(plotly)
library(knitr)
library(kableExtra)
library(ggimage)
library(jsonlite)

# BigQuery
bq <- dbConnect(bigrquery::bigquery(),
                project = "pjb-sports-data",
                dataset = "mlb")

# Load Statcast Data
statcast_leaderboard <- list()
for (player_type in c("batter", "pitcher")) {
  statcast_leaderboard[[player_type]] <- statcast_leaderboards(
    leaderboard = "expected_statistics", year = 2023, min_pa = 1,
    player_type = player_type
  )
}

Intro

We talk often about player tendencies to achieve certain outcomes… a hitter’s spray chart, a hitter’s launch angle breakdown, a pitcher’s pitch distribution, a pitcher’s fly ball to ground ball ratio, etc. However, there is a common event that often gets overlooked in baseball analysis: the foul ball. Every now and then, a stat about foul balls will emerge, such as how Joey Votto pulled one foul ball into the seats over the first 2,138 plate appearances of his career, but more attention is lent to the balls that are put in play, watched or whiffed at.

ggplot(
  dbGetQuery(bq, 'SELECT
                    description,
                    count / total `%`
                  FROM
                    (
                      SELECT
                        1 foo,
                        description,
                        COUNT(*) count
                      FROM
                        (
                          SELECT
                            REGEXP_REPLACE(description, "_*blocked_*", "") description,
                          FROM
                            `mlb.statcast_pitches`
                          WHERE
                            game_year = 2023 AND game_type = "R"
                        )
                      GROUP BY
                        description
                    )
                  JOIN
                    (
                      SELECT
                        1 foo,
                        COUNT(*) total
                      FROM
                        `mlb.statcast_pitches`
                      WHERE
                        game_year = 2023 AND game_type = "R"
                    )
                  USING
                    (foo);'),
  aes(reorder(description, -`%`), `%`)
) +
  geom_bar(stat = "identity", fill = "#0099f9") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  labs(title = "2023 Pitch Results", x = "", y = "Frequency") +
  scale_y_continuous(labels = percent, limits = c(0, 0.4)) +
  geom_text(aes(label = percent(`%`, accuracy = 0.1)), vjust = -0.5, size = 3)

Foul balls make up nearly 18% of pitch outcomes and can be interpreted in a number of different ways. With less than two strikes… * the pitch wasn’t really what the batter was looking for, but he swung anyway. * the batter’s swing or timing was slightly off. With two strikes… * the batter is just trying to stay alive with two strikes. * the pitch wasn’t located well enough or wasn’t deceptive enough to miss the bat entirely.

Fouls can feel like a neutral outcome when they happen on the field, but they are almost always a positive outcome for either the pitcher or the hitter. For example, with less than two strikes, a foul ball is a positive outcome for the pitcher. Certain foul balls, like a towering drive that hooks foul, can indicate that the batter has the upper hand, but it still counts like any other strike, and the pitcher can breath a sigh of relief. When there are two strikes, a foul ball is always going to feel like the batter held strong, and the pitcher is annoyed that he has to throw another pitch. In short, a foul ball is a win for the pitcher with less than 2 strikes, and it is a win for the hitter with 2 strikes. With this in mind, I wanted to split hitter and pitcher foul ball tendencies based on the count.

batters.2023.df <- statcast_leaderboard$batter %>%
  filter(pa >= 200) %>%
  mutate(Name = sub("(.+),\\s(.+)","\\2 \\1", `last_name, first_name`)) %>%
  select(-year, -`last_name, first_name`, -est_ba_minus_ba_diff,
         -est_slg_minus_slg_diff, -est_woba_minus_woba_diff) %>%
  rename(PA = pa) %>%
  # All pitches
  inner_join(
    dbGetQuery(bq, 'SELECT
                      batter player_id,
                      COUNTIF(description IN ("swinging_strike", "foul_tip", "swinging_strike_blocked")) / COUNT(*) `0|1-Strike Whiff %`,
                      COUNTIF(description = "foul") / COUNT(*) `0|1-Strike Foul %`
                    FROM
                      `mlb.statcast_pitches`
                    WHERE
                      game_year = 2023 AND game_type = "R" AND strikes < 2 AND
                      description IN ("foul", "hit_into_play", "swinging_strike", "foul_tip", "swinging_strike_blocked") /* Swing */
                    GROUP BY
                      batter;'),
    by = "player_id"
  ) %>%
  # 2-Strikes
  inner_join(
    dbGetQuery(bq, 'SELECT
                      batter player_id,
                      COUNTIF(description IN ("swinging_strike", "foul_tip", "swinging_strike_blocked")) / COUNT(*) `2-Strike Whiff %`,
                      COUNTIF(description = "foul") / COUNT(*) `2-Strike Foul %`
                    FROM
                      `mlb.statcast_pitches`
                    WHERE
                      game_year = 2023 AND game_type = "R" AND strikes = 2 AND
                      description IN ("foul", "hit_into_play", "swinging_strike", "foul_tip", "swinging_strike_blocked") /* Swing */
                    GROUP BY
                      batter;'),
    by = "player_id"
  ) %>%
  rename(`0/1-Strike Whiff %` = `0|1-Strike Whiff %`,
         `0/1-Strike Foul %` = `0|1-Strike Foul %`)

pitchers.2023.df <- statcast_leaderboard$pitcher %>%
  filter(pa >= 200) %>%
  mutate(Name = sub("(.+),\\s(.+)","\\2 \\1", `last_name, first_name`)) %>%
  select(-year, -`last_name, first_name`, -est_ba_minus_ba_diff,
         -est_slg_minus_slg_diff, -est_woba_minus_woba_diff) %>%
  rename(TBF = pa) %>%
  # All pitches
  inner_join(
    dbGetQuery(bq, 'SELECT
                      pitcher player_id,
                      COUNTIF(description IN ("swinging_strike", "foul_tip", "swinging_strike_blocked")) / COUNT(*) `0|1-Strike Whiff %`,
                      COUNTIF(description = "foul") / COUNT(*) `0|1-Strike Foul %`
                    FROM
                      `mlb.statcast_pitches`
                    WHERE
                      game_year = 2023 AND game_type = "R" AND strikes < 2 AND
                      description IN ("foul", "hit_into_play", "swinging_strike", "foul_tip", "swinging_strike_blocked") /* Swing */
                    GROUP BY
                      pitcher;'),
    by = "player_id"
  ) %>%
  # 2-Strikes
  inner_join(
    dbGetQuery(bq, 'SELECT
                      pitcher player_id,
                      COUNTIF(description IN ("swinging_strike", "foul_tip", "swinging_strike_blocked")) / COUNT(*) `2-Strike Whiff %`,
                      COUNTIF(description = "foul") / COUNT(*) `2-Strike Foul %`
                    FROM
                      `mlb.statcast_pitches`
                    WHERE
                      game_year = 2023 AND game_type = "R" AND strikes = 2 AND
                      description IN ("foul", "hit_into_play", "swinging_strike", "foul_tip", "swinging_strike_blocked") /* Swing */
                    GROUP BY
                      pitcher;'),
    by = "player_id"
  ) %>%
  rename(`0/1-Strike Whiff %` = `0|1-Strike Whiff %`,
         `0/1-Strike Foul %` = `0|1-Strike Foul %`)

2023 Batters (min. 200 PAs)

0/1-Strike Foul %

2-Strike Foul %

10 Highest

kable(
  batters.2023.df %>%
    select(Name, PA, `0/1-Strike Foul %`) %>%
    arrange(desc(`0/1-Strike Foul %`)) %>%
    mutate(
      `0/1-Strike Foul %` = percent(`0/1-Strike Foul %`, accuracy = 0.01)
    ) %>%
    head(10)
) %>% kable_styling(bootstrap_options = c("bordered", "striped"))
Name PA 0/1-Strike Foul %
Jake Cronenworth 522 48.78%
Isaac Paredes 571 48.43%
Bo Naylor 230 46.54%
Ozzie Albies 660 45.89%
Nathaniel Lowe 724 45.59%
Gio Urshela 228 45.05%
Zach McKinstry 518 45.03%
Max Kepler 491 44.52%
Adley Rutschman 687 44.26%
Pavin Smith 228 44.20%
kable(
  batters.2023.df %>%
    select(Name, PA, `2-Strike Foul %`) %>%
    arrange(desc(`2-Strike Foul %`)) %>%
    mutate(
      `2-Strike Foul %` = percent(`2-Strike Foul %`, accuracy = 0.01)
    ) %>%
    head(10)
) %>% kable_styling(bootstrap_options = c("bordered", "striped"))
Name PA 2-Strike Foul %
Yordan Alvarez 496 46.95%
Sal Frelick 223 46.75%
Geraldo Perdomo 495 46.02%
Justin Turner 626 45.81%
Ty France 665 45.59%
Isiah Kiner-Falefa 361 45.55%
Cody Bellinger 556 45.43%
Daulton Varsho 581 45.11%
Will Smith 554 44.87%
Alec Burleson 347 44.87%

10 Lowest

kable(
  batters.2023.df %>%
    select(Name, PA, `0/1-Strike Foul %`) %>%
    arrange(`0/1-Strike Foul %`) %>%
    mutate(
      `0/1-Strike Foul %` = percent(`0/1-Strike Foul %`, accuracy = 0.01)
    ) %>%
    head(10)
) %>% kable_styling(bootstrap_options = c("bordered", "striped"))
Name PA 0/1-Strike Foul %
Christian Bethancourt 332 29.48%
William Contreras 611 29.57%
Eloy Jiménez 489 29.92%
Javier Báez 547 30.26%
Kevin Kiermaier 408 30.29%
Aaron Judge 458 30.75%
Jose Siri 364 30.82%
Joey Wiemer 410 30.87%
Jordan Walker 465 30.91%
Luke Raley 406 31.07%
kable(
  batters.2023.df %>%
    select(Name, PA, `2-Strike Foul %`) %>%
    arrange(`2-Strike Foul %`) %>%
    mutate(
      `2-Strike Foul %` = percent(`2-Strike Foul %`, accuracy = 0.01)
    ) %>%
    head(10)
) %>% kable_styling(bootstrap_options = c("bordered", "striped"))
Name PA 2-Strike Foul %
Jazz Chisholm Jr.  383 23.53%
Mark Vientos 233 25.32%
Jose Siri 364 27.57%
Eloy Jiménez 489 27.94%
Harrison Bader 344 28.64%
Brett Baty 389 28.69%
Francisco Alvarez 423 28.87%
Christopher Morel 429 29.08%
Mickey Moniak 323 29.51%
Paul DeJong 400 29.64%
# https://plotly.com/ggplot2/configuration-options/
# Highlight POIs: https://thiyanga.netlify.app/post/scatterplot/
batter.scatter.plot.df <- batters.2023.df %>%
  mutate(diff = `2-Strike Foul %` - `0/1-Strike Foul %`,
         color = as.factor(
           ifelse(diff >= 0.06, "#025189",
                  ifelse(diff >= 0.03, "#0c9cb4",
                         ifelse(diff >= 0, "#94c280",
                                ifelse(diff >= -0.03, "#f1c359",
                                       ifelse(diff >= -0.06, "#d03f2e",
                                              "#982123"))))))) %>%
  select(Name, `2-Strike Foul %`, `0/1-Strike Foul %`, color)

ggplotly(
  ggplot(
    batter.scatter.plot.df,
    aes(x = `0/1-Strike Foul %`, y = `2-Strike Foul %`,
        text = paste(Name, "\n0/1-Strike Foul %: ",
                     percent(`0/1-Strike Foul %`, accuracy = 0.1),
                     "\n2-Strike Foul %: ",
                     percent(`2-Strike Foul %`, accuracy = 0.1), sep = ""))) +
    geom_point(aes(color = color)) +
    scale_colour_manual(values = levels(batter.scatter.plot.df$color)) +
    labs(title = "2023 Batters (min. 200 PAs)") +
    scale_x_continuous(labels = percent) +
    scale_y_continuous(labels = percent) +
    geom_abline(linetype = "dotted") +
    theme(legend.position = "none"),
  tooltip = "text") %>%
  layout(annotations = list(
    list(text = "Hover over any point for player details", x = 0.437, y = 0.462,
         font = list(size = 10)),
    list(text = "y = x", x = 0.484, y = 0.476, font = list(size = 10),
         showarrow = FALSE))) %>%
  config(displayModeBar = FALSE)

There is a positive correlation between 0/1-Strike Foul % and 2-Strike Foul % (correlation coefficient of 0.42), but substantial variance exists, too. Take Yordan Alvarez, for example. With 0 or 1 strike, he fouls off only 34.1% of pitches, but that number increases to 46.9% with 2 strikes, which is the highest in all of MLB.

# ggplotly(
  ggplot(
    dbGetQuery(bq, 'WITH
                      team_foul_pct
                    AS
                      (
                        SELECT
                          CASE inning_topbot WHEN "top" THEN away_team ELSE home_team END team,
                          strikes,
                          COUNTIF(description = "foul") fouls,
                          COUNT(*) pitches,
                        FROM
                          `mlb.statcast_pitches`
                        WHERE
                          game_year = 2023 AND game_type = "R" AND
                          description IN ("foul", "hit_into_play", "swinging_strike", "foul_tip", "swinging_strike_blocked") /* Swing */
                        GROUP BY
                          team,
                          strikes
                      )
                    
                    SELECT
                      team,
                      `0|1-Strike Foul %`,
                      `2-Strike Foul %`
                    FROM
                      (
                        SELECT
                          team,
                          SUM(fouls) / SUM(pitches) `0|1-Strike Foul %`
                        FROM
                          team_foul_pct
                        WHERE
                          strikes < 2
                        GROUP BY
                          team
                      )
                    JOIN
                      (
                        SELECT
                          team,
                          SUM(fouls) / SUM(pitches) `2-Strike Foul %`
                        FROM
                          team_foul_pct
                        WHERE
                          strikes = 2
                        GROUP BY
                          team
                      )
                    USING
                      (team);') %>%
      rename(`0/1-Strike Foul %` = `0|1-Strike Foul %`) %>%
      left_join(fromJSON("https://statsapi.mlb.com/api/v1/teams?lang=en&sportId=1&season=2023")$teams %>%
                  mutate(logo = paste("https://www.mlbstatic.com/team-logos/", id, ".svg", sep = "")) %>%
                  select(abbreviation, logo),
                by = c("team" = "abbreviation")),
    aes(x = `0/1-Strike Foul %`, y = `2-Strike Foul %`,
        text = paste(team, "\n0/1-Strike Foul %: ",
                     percent(`0/1-Strike Foul %`, accuracy = 0.1),
                     "\n2-Strike Foul %: ",
                     percent(`2-Strike Foul %`, accuracy = 0.1), sep = ""))) +
    geom_image(aes(image = logo), size = 0.075, by = "height") +
    labs(title = "2023 Teams (Batting)") +
    scale_x_continuous(labels = percent) +
    scale_y_continuous(labels = percent) +
    geom_abline(linetype = "dotted") +
    theme(legend.position = "none")#,

  # tooltip = "text") %>%
  # layout(annotations = list(
  #   list(text = "Hover over any point for team details", x = 0.377, y = 0.3965,
  #        font = list(size = 10)),
  #   list(text = "y = x", x = 0.388, y = 0.39, font = list(size = 10),
  #        showarrow = FALSE))) %>%
  # config(displayModeBar = FALSE)

2023 Pitchers (min. 200 batters faced)

0/1-Strike Foul %

2-Strike Foul %

10 Highest

kable(
  pitchers.2023.df %>%
    select(Name, TBF, `0/1-Strike Foul %`) %>%
    arrange(desc(`0/1-Strike Foul %`)) %>%
    mutate(
      `0/1-Strike Foul %` = percent(`0/1-Strike Foul %`, accuracy = 0.01)
    ) %>%
    head(10)
) %>% kable_styling(bootstrap_options = c("bordered", "striped"))
Name TBF 0/1-Strike Foul %
Brusdar Graterol 257 45.57%
Chris Murphy 212 45.31%
Luis Severino 417 44.93%
Johnny Cueto 218 44.36%
Nestor Cortes 266 44.34%
Steven Wilson 219 44.21%
Joe Ryan 672 44.04%
Brad Hand 236 44.00%
Louie Varland 283 43.34%
Cody Bradford 234 43.08%
kable(
  pitchers.2023.df %>%
    select(Name, TBF, `2-Strike Foul %`) %>%
    arrange(desc(`2-Strike Foul %`)) %>%
    mutate(
      `2-Strike Foul %` = percent(`2-Strike Foul %`, accuracy = 0.01)
    ) %>%
    head(10)
) %>% kable_styling(bootstrap_options = c("bordered", "striped"))
Name TBF 2-Strike Foul %
Ron Marinaccio 205 51.41%
Josh Hader 231 47.62%
Sean Manaea 499 46.85%
Drew Smith 244 46.73%
Reynaldo López 278 46.33%
Brock Burke 250 46.07%
Johnny Cueto 218 45.71%
Kyle Muller 372 45.66%
Jhony Brito 372 45.59%
Jake Irvin 530 45.57%

10 Lowest

kable(
  pitchers.2023.df %>%
    select(Name, TBF, `0/1-Strike Foul %`) %>%
    arrange(`0/1-Strike Foul %`) %>%
    mutate(
      `0/1-Strike Foul %` = percent(`0/1-Strike Foul %`, accuracy = 0.01)
    ) %>%
    head(10)
) %>% kable_styling(bootstrap_options = c("bordered", "striped"))
Name TBF 0/1-Strike Foul %
Robert Stephenson 201 25.29%
Elvis Peguero 252 25.82%
Andrés Muñoz 211 27.80%
Giovanny Gallegos 229 28.28%
Alex Lange 288 28.71%
Josh Sborz 215 28.92%
Gregory Santos 289 28.96%
Bryan Abreu 287 29.15%
Josh Fleming 221 29.39%
Alex Young 236 29.79%
kable(
  pitchers.2023.df %>%
    select(Name, TBF, `2-Strike Foul %`) %>%
    arrange(`2-Strike Foul %`) %>%
    mutate(
      `2-Strike Foul %` = percent(`2-Strike Foul %`, accuracy = 0.01)
    ) %>%
    head(10)
) %>% kable_styling(bootstrap_options = c("bordered", "striped"))
Name TBF 2-Strike Foul %
Alex Lange 288 25.00%
Yency Almonte 207 25.86%
Phil Maton 274 26.63%
Mark Leiter Jr.  269 26.83%
Alexis Díaz 286 27.66%
Quinn Priester 234 28.00%
Jordan Romano 248 28.12%
Gregory Soto 250 28.26%
Albert Abreu 268 28.66%
Drew VerHagen 268 28.82%
# https://plotly.com/ggplot2/configuration-options/
# Highlight POIs: https://thiyanga.netlify.app/post/scatterplot/
pitcher.scatter.plot.df <- pitchers.2023.df %>%
  mutate(diff = `2-Strike Foul %` - `0/1-Strike Foul %`,
         color = as.factor(
           ifelse(diff >= 0.06, "#982123",
                  ifelse(diff >= 0.03, "#d03f2e",
                         ifelse(diff >= 0, "#f1c359",
                                ifelse(diff >= -0.03, "#94c280",
                                       ifelse(diff >= -0.06, "#0c9cb4",
                                              "#025189"))))))) %>%
  select(Name, `2-Strike Foul %`, `0/1-Strike Foul %`, color)

ggplotly(
  ggplot(
    pitcher.scatter.plot.df,
    aes(x = `0/1-Strike Foul %`, y = `2-Strike Foul %`,
        text = paste(Name, "\n0/1-Strike Foul %: ",
                     percent(`0/1-Strike Foul %`, accuracy = 0.1),
                     "\n2-Strike Foul %: ",
                     percent(`2-Strike Foul %`, accuracy = 0.1), sep = ""))) +
    geom_point(aes(color = color)) +
    scale_colour_manual(values = levels(pitcher.scatter.plot.df$color)) +
    labs(title = "2023 Pitchers (min. 200 batters faced)") +
    scale_x_continuous(labels = percent) +
    scale_y_continuous(labels = percent) +
    geom_abline(linetype = "dotted") +
    theme(legend.position = "none"),
  tooltip = "text") %>%
  layout(annotations = list(
    list(text = "Hover over any point for player details", x = 0.316, y = 0.444,
         font = list(size = 10)),
    list(text = "y = x", x = 0.461, y = 0.47, font = list(size = 10),
         showarrow = FALSE))) %>%
  config(displayModeBar = FALSE)

Do foul ball tendencies relate to K%

Hard hit foul balls?

Since Statcast still tracks the exit velocity of foul balls, I wanted to examine not just the frequency with which players hit foul balls, but also how hard they are hitting them (Avg. EV and Hard Hit %)

Players of Interest

dbDisconnect(bq)